基于深度学习(DL)的语音增强方法通常优化,以最小化干净和增强语音功能之间的距离。这些经常导致语音质量改善,但它们缺乏普遍化,并且可能无法在实际嘈杂情况下提供所需的语音可懂度。为了解决这些挑战,研究人员已经探索了智能性(I-O)丢失函数和用于更强大的语音增强(SE)的视听(AV)信息的集成。在本文中,我们介绍了基于DL的I-O SE算法利用AV信息,这是一种新颖且以前未开发的研究方向。具体而言,我们介绍了一个完全卷积的AV SE模型,它使用改进的短时客观可懂度(STOI)度量作为培训成本函数。据我们所知,这是第一个利用基于I-O的I-O的损耗函数的AV模式集成的第一项工作。比较实验结果表明,我们提出的I-O AV SE框架优于与传统距离的损耗功能训练的仅音频(AO)和AV模型,就标准客观的扬声器和噪声处理。
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Security issues are threatened in various types of networks, especially in the Internet of Things (IoT) environment that requires early detection. IoT is the network of real-time devices like home automation systems and can be controlled by open-source android devices, which can be an open ground for attackers. Attackers can access the network, initiate a different kind of security breach, and compromises network control. Therefore, timely detecting the increasing number of sophisticated malware attacks is the challenge to ensure the credibility of network protection. In this regard, we have developed a new malware detection framework, Deep Squeezed-Boosted and Ensemble Learning (DSBEL), comprised of novel Squeezed-Boosted Boundary-Region Split-Transform-Merge (SB-BR-STM) CNN and ensemble learning. The proposed S.T.M. block employs multi-path dilated convolutional, Boundary, and regional operations to capture the homogenous and heterogeneous global malicious patterns. Moreover, diverse feature maps are achieved using transfer learning and multi-path-based squeezing and boosting at initial and final levels to learn minute pattern variations. Finally, the boosted discriminative features are extracted from the developed deep SB-BR-STM CNN and provided to the ensemble classifiers (SVM, M.L.P., and AdaboostM1) to improve the hybrid learning generalization. The performance analysis of the proposed DSBEL framework and SB-BR-STM CNN against the existing techniques have been evaluated by the IOT_Malware dataset on standard performance measures. Evaluation results show progressive performance as 98.50% accuracy, 97.12% F1-Score, 91.91% MCC, 95.97 % Recall, and 98.42 % Precision. The proposed malware analysis framework is helpful for the timely detection of malicious activity and suggests future strategies.
translated by 谷歌翻译
3D point clouds are rich in geometric structure information, while 2D images contain important and continuous texture information. Combining 2D information to achieve better 3D semantic segmentation has become mainstream in 3D scene understanding. Albeit the success, it still remains elusive how to fuse and process the cross-dimensional features from these two distinct spaces. Existing state-of-the-art usually exploit bidirectional projection methods to align the cross-dimensional features and realize both 2D & 3D semantic segmentation tasks. However, to enable bidirectional mapping, this framework often requires a symmetrical 2D-3D network structure, thus limiting the network's flexibility. Meanwhile, such dual-task settings may distract the network easily and lead to over-fitting in the 3D segmentation task. As limited by the network's inflexibility, fused features can only pass through a decoder network, which affects model performance due to insufficient depth. To alleviate these drawbacks, in this paper, we argue that despite its simplicity, projecting unidirectionally multi-view 2D deep semantic features into the 3D space aligned with 3D deep semantic features could lead to better feature fusion. On the one hand, the unidirectional projection enforces our model focused more on the core task, i.e., 3D segmentation; on the other hand, unlocking the bidirectional to unidirectional projection enables a deeper cross-domain semantic alignment and enjoys the flexibility to fuse better and complicated features from very different spaces. In joint 2D-3D approaches, our proposed method achieves superior performance on the ScanNetv2 benchmark for 3D semantic segmentation.
translated by 谷歌翻译
Large language models have recently attracted significant attention due to their impressive performance on a variety of tasks. ChatGPT developed by OpenAI is one such implementation of a large, pre-trained language model that has gained immense popularity among early adopters, where certain users go to the extent of characterizing it as a disruptive technology in many domains. Understanding such early adopters' sentiments is important because it can provide insights into the potential success or failure of the technology, as well as its strengths and weaknesses. In this paper, we conduct a mixed-method study using 10,732 tweets from early ChatGPT users. We first use topic modelling to identify the main topics and then perform an in-depth qualitative sentiment analysis of each topic. Our results show that the majority of the early adopters have expressed overwhelmingly positive sentiments related to topics such as Disruptions to software development, Entertainment and exercising creativity. Only a limited percentage of users expressed concerns about issues such as the potential for misuse of ChatGPT, especially regarding topics such as Impact on educational aspects. We discuss these findings by providing specific examples for each topic and then detail implications related to addressing these concerns for both researchers and users.
translated by 谷歌翻译
Malaria is a potentially fatal plasmodium parasite injected by female anopheles mosquitoes that infect red blood cells and millions worldwide yearly. However, specialists' manual screening in clinical practice is laborious and prone to error. Therefore, a novel Deep Boosted and Ensemble Learning (DBEL) framework, comprising the stacking of new Boosted-BR-STM convolutional neural networks (CNN) and ensemble classifiers, is developed to screen malaria parasite images. The proposed STM-SB-BRNet is based on a new dilated-convolutional block-based split transform merge (STM) and feature-map Squeezing-Boosting (SB) ideas. Moreover, the new STM block uses regional and boundary operations to learn the malaria parasite's homogeneity, heterogeneity, and boundary with patterns. Furthermore, the diverse boosted channels are attained by employing Transfer Learning-based new feature-map SB in STM blocks at the abstract, medium, and conclusion levels to learn minute intensity and texture variation of the parasitic pattern. The proposed DBEL framework implicates the stacking of prominent and diverse boosted channels and provides the generated discriminative features of the developed Boosted-BR-STM to the ensemble of ML classifiers. The proposed framework improves the discrimination ability and generalization of ensemble learning. Moreover, the deep feature spaces of the developed Boosted-BR-STM and customized CNNs are fed into ML classifiers for comparative analysis. The proposed DBEL framework outperforms the existing techniques on the NIH malaria dataset that are enhanced using discrete wavelet transform to enrich feature space. The proposed DBEL framework achieved accuracy (98.50%), sensitivity (0.9920), F-score (0.9850), and AUC (0.997), which suggest it to be utilized for malaria parasite screening.
translated by 谷歌翻译
Spatial perception is a key task in several robotics applications. In general, it involves the nonlinear estimation of hidden variables that represent the state of the robot/environment. However, in the presence of outliers the standard nonlinear least squared formulation results in poor estimates. Several methods have been considered in the literature to improve the reliability of the estimation process. Most methods are based on heuristics since guaranteed global robust estimation is not generally practical due to high computational costs. Recently general purpose robust estimation heuristics have been proposed that leverage existing non-minimal solvers available for the outlier-free formulations without the need for an initial guess. In this work, we propose two similar heuristics backed by Bayesian theory. We evaluate these heuristics in practical scenarios to demonstrate their merits in different applications including 3D point cloud registration, mesh registration and pose graph optimization.
translated by 谷歌翻译
Brain tumor classification is crucial for clinical analysis and an effective treatment plan to cure patients. Deep learning models help radiologists to accurately and efficiently analyze tumors without manual intervention. However, brain tumor analysis is challenging because of its complex structure, texture, size, location, and appearance. Therefore, a novel deep residual and regional-based Res-BRNet Convolutional Neural Network (CNN) is developed for effective brain tumor (Magnetic Resonance Imaging) MRI classification. The developed Res-BRNet employed Regional and boundary-based operations in a systematic order within the modified spatial and residual blocks. Moreover, the spatial block extract homogeneity and boundary-defined features at the abstract level. Furthermore, the residual blocks employed at the target level significantly learn local and global texture variations of different classes of brain tumors. The efficiency of the developed Res-BRNet is evaluated on a standard dataset; collected from Kaggle and Figshare containing various tumor categories, including meningioma, glioma, pituitary, and healthy images. Experiments prove that the developed Res-BRNet outperforms the standard CNN models and attained excellent performances (accuracy: 98.22%, sensitivity: 0.9811, F-score: 0.9841, and precision: 0.9822) on challenging datasets. Additionally, the performance of the proposed Res-BRNet indicates a strong potential for medical image-based disease analyses.
translated by 谷歌翻译
Context-sensitive two-point layer 5 pyramidal cells (L5PCs) were discovered as long ago as 1999. However, the potential of this discovery to provide useful neural computation has yet to be demonstrated. Here we show for the first time how a transformative L5PCs-driven deep neural network (DNN), termed the multisensory cooperative computing (MCC) architecture, can effectively process large amounts of heterogeneous real-world audio-visual (AV) data, using far less energy compared to best available 'point' neuron-driven DNNs. A novel highly-distributed parallel implementation on a Xilinx UltraScale+ MPSoC device estimates energy savings up to 245759 $ \times $ 50000 $\mu$J (i.e., 62% less than the baseline model in a semi-supervised learning setup) where a single synapse consumes $8e^{-5}\mu$J. In a supervised learning setup, the energy-saving can potentially reach up to 1250x less (per feedforward transmission) than the baseline model. The significantly reduced neural activity in MCC leads to inherently fast learning and resilience against sudden neural damage. This remarkable performance in pilot experiments demonstrates the embodied neuromorphic intelligence of our proposed cooperative L5PC that receives input from diverse neighbouring neurons as context to amplify the transmission of most salient and relevant information for onward transmission, from overwhelmingly large multimodal information utilised at the early stages of on-chip training. Our proposed approach opens new cross-disciplinary avenues for future on-chip DNN training implementations and posits a radical shift in current neuromorphic computing paradigms.
translated by 谷歌翻译
在制定政策指南时,随机对照试验(RCT)代表了黄金标准。但是,RCT通常是狭窄的,并且缺乏更广泛的感兴趣人群的数据。这些人群中的因果效应通常是使用观察数据集估算的,这可能会遭受未观察到的混杂和选择偏见。考虑到一组观察估计(例如,来自多项研究),我们提出了一个试图拒绝偏见的观察性估计值的元偏值。我们使用验证效应,可以从RCT和观察数据中推断出的因果效应。在拒绝未通过此测试的估计器之后,我们对RCT中未观察到的亚组的外推性效应产生了保守的置信区间。假设至少一个观察估计量在验证和外推效果方面是渐近正常且一致的,我们为我们算法输出的间隔的覆盖率概率提供了保证。为了促进在跨数据集的因果效应运输的设置中,我们给出的条件下,即使使用灵活的机器学习方法用于估计滋扰参数,群体平均治疗效应的双重稳定估计值也是渐近的正常。我们说明了方法在半合成和现实世界数据集上的特性,并表明它与标准的荟萃分析技术相比。
translated by 谷歌翻译